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ABSTRACT 

We identify galaxy groups and clusters in volume-limited samples of the SDSS redshift survey, 
using a redshift-space friends-of-friends algorithm. We optimize the friends-of-friends linking lengths 
to recover galaxy systems that occupy the same dark matter halos, using a set of mock catalogs created 
by populating halos of N-body simulations with galaxies. Extensive tests with these mock catalogs 
show that no combination of perpendicular and line-of-sight linking lengths is able to yield groups and 
clusters that simultaneously recover the true halo multiplicity function, projected size distribution, and 
velocity dispersion. We adopt a linking length combination that yields, for galaxy groups with ten or 
more members: a group multiplicity function that is unbiased with respect to the true halo multiplicity 
function; an unbiased median relation between the multiplicities of groups and their associated halos; 
a spurious group fraction of less than ^ 1%; a halo completeness of more than ~ 97%; the correct 
projected size distribution as a function of multiplicity; and a velocity dispersion distribution that is 
^ 20% too low at all multiplicities. These results hold over a range of mock catalogs that use different 
input recipes of populating halos with galaxies. We apply our group- finding algorithm to the SDSS 
data and obtain three group and cluster catalogs for three volume-limited samples that cover 3495.1 
square degrees on the sky, go out to redshifts of 0.1, 0.068, and 0.045, and contain 57138, 37820, and 
18895 galaxies, respectively. We correct for incompleteness caused by fiber collisions and survey edges, 
and obtain measurements of the group multiplicity function, with errors calculated from realistic mock 
catalogs. These multiplicity function measurements provide a key constraint on the relation between 
galaxy populations and dark matter halos. 

Subject headings: cosmology: large-scale structure of universe — galaxies: clusters 



1. INTRODUCTION 

Galaxies are gregarious by nature. Bright galaxies typ- 
ically reside in groups or clusters, surrounded by less 
luminous neighbors. Interactions within the group or 
cluster environment may have important effects on the 
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star formation history, morphology, dynamics, and other 
properties of member galaxies. Characterizing the re- 
lation between galaxy properties and their group envi- 
ronment is thus a key step in understanding galaxy for- 
mation and evolution. At the density thresholds often 
used to identify groups, most members should belong 
to the same, gravitationally bound dark matter (DM) 
halo.^^ Recent approaches to describing the relation 
between galaxies and DM focus on galaxy populations 
of DM halos as a function of halo mass. Specifically, 
the bias of a particular class of galaxies can be char- 
acterized by its Halo Occupation Distribution (HOD), 
which specifies the probability distribution P{N\M) that 
a halo of mass M contains N such galaxies, together 
with relations describing the relative spatial and veloc- 
ity distributions of galaxies an d dark matter within ha- 
los ijBerlind fc Weinbergll2002 * and references therein) . A 
well defined group catalog with well understood proper- 
ties can play a central role in the empirical determination 
of this relation. 

This paper presents a group and cluster catalog 
defined from the Sloan Digital Sky Survey (SDSS, 
York et al. 2000). While this catalog is useful for 

Throughout this paper, wc use the term "halo" to refer to 
a gravitationally bound structure with overdensity p/p ~ 200, so 
an occupied halo may host a single luminous galaxy, a group of 
galaxies, or a cluster. Higher overdensity concentrations around 
individual galaxies of a group or cluster constitute, in this termi- 
nology, halo substructure, or "sub-halos". 
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many purposes, our overriding objective is to obtain a 
well understood measurement of the group multiplic- 
ity function (the space density of groups as a func- 
tion of richness), with the goal of determining the 
HOD in the high rn a ss reg ime (Peacock & Smith 2000^ 
Berhnd fc Weinberg! 1200^ Marinoni fc Hudson 2QCj^ 
Kochanek et all 120031: iLin et all l200|l With this ob- 
jective in mind, we have adopted a simple group- 
finding algorithm, frie nds-of-friends in redshift space 
ijHuchra fc Gelleil ll982'). and carried out extensive tests 
on realistic mock catalogs in order to assess its per- 
formance and optimize parameter choices. We apply 
the group-finding algorithm to volume-limited samples of 
galaxies so that the resulting group statistics characterize 
the clustering of well defined populations of galaxies. 

Galaxy clusters have been the focus of study since 
they were first seen o n optical photo graphic plates 
ijShaplev fc Ame£lll926f) . iZwickvl l|1937j) pioneered the 
study of clusters as dynamical objects by using imag- 
ing and spectroscopy of the Coma cluster to estimate 
its mass. However, the m ost inf l uentia l pioneering work 
on clusters was done by lAbeU l)1958|) . who assembled 
the first large sample of gala xv clus ters. The Abell 
catalog of rich galaxy clusters llAbelllirg SS: Abell etaTI 
Il989|) was created by eyeball identification in the Palo- 
mar Observatory S ky Survey and i t spaw ned numerous 
follow-up studies. Ide VaucoiileiirsI l)197lD shifted focus 
to poorer systems by st udying nearby groups of galaxies. 
lOott fc Tiirneil l)1977hf) made the first measurement of 
the group multiplicity function using the (iTiirner fc GottI 
^76) catalog of groups selected based on the projected 
surface density of galaxies. 

With the advent of large redshift surveys, group iden- 
tification became three dimensional and thus less sub- 
ject to projection effects . Gro up- finding in redshift 
space was pione e red b y iHuchra & GcUcr (1982) and 
iGeller fc Huchral l|1983t) . using the Center for Astro- 
physics (CfA) redshift survey. Subsequent versions of the 
CfA redshift s urvey were used to identify poups by var- 
ious authors dNolthenius & White' '1987"; 'Ra rnella et al.l 
^89; Moore ct al. 1993; Rarnella et al. 199"^^ Other 
redshift surveys that spawned group catalogs were the 
Nearby Galaxies Catalog (iTullviil987i) . the ESO Shce 
Project ijRamella et alJll999D. the Las Ca mpanas Red- 



shift Survey fLCRS) (iTu cker et al J 120001) . th e Nearby 
Optical Galaxy Sample fNOG) ijGiuricin et al . 2000), the 
Southern Sky Redshift Survey (SSRS^) (RamcUa et al. 



2^22 ) , the 2dF redshif t siirvev l|Merchan fc Zandivarez, 
2002t lEke et al . 2004; Yang et al.ll2005D. and even the 
high redshift DEEP2 survey l|Gerke et al.ll2005^ . 

There have been several efforts to detect clusters in the 
SDSS to date, most of them using t he pho t ometr ic data 
rather than the redshift data. .Annis et alJ l)1999() devel- 
oped the maxBCG technique, where Brightest Cluster 
Galaxy (BCG) candidates are identified based on their 
colors and magnitudes and other cluster members are se- 
lected from near by galaxies that have the colors of the 
E/SO ridgehne. iKim et all l|2002il developed a hybrid 
matched filter (HMF) technique that assumes a radial 
profi le for clust ers and convolves the data with that fil- 
ter. iGoto et~al . (2002) developed the cut-and-enhance 
(CE) method, which selects overdensities of galaxies that 
have similar colors. All these techniques were applied 
to the early SDSS commissioning data (jBahcall et al.l 



l2003t IGoto erani200^ . iLee et al.l (|200|) identified com- 
pact groups by looking for small and isolated concentra- 
tions of galaxies in the SDSS Early Data Release (EDR; 
[Stoughton ct al] l2002() . Cluster searches in the SDSS 
redshift survey have also been carried out. 'Goto (200^ 
used a friends-of-friends algorithm (though with linking 
lengths that do not scale with the changing number den- 
sity of galaxies due to the flux limit) to identify clusters in 
the SDSS Data Release 2 (DR2; Abazajian ct al. 2004). 
iMerchan fc Zandivarez (2005) used a friends-of-friends 
algorit hm to identify groups in t he SDSS Data Release 3 
fDR,3: lAbazaiian et al.ll2005al) . iWeinmann et al. (200^ 
used thelYanget^^jboOSl) algorit hm to identify groups 
in SDSS DR2"^. Mill er et all (|200l developed the C4 al- 
gorithm for finding clusters in redshift space and also 
apphed it to the SDSS DR2. The C4 algorithm looks 
for concentrations of galaxies in a seven-dimensional po- 
sition and color space. It takes advantage of the color 
similarity of cluster member galaxies and thus minimizes 
contamination due to projection. However, some correla- 
tions are built into the method, and modeling it in order 
to understand the properties of the resulting cluster cat- 
alog requires a complete model of the galaxy population 
(including colors and luminosities). Our method com- 
plements the C4 catalog by applying a simple and easily 
modeled algorithm to volume-limited samples with ho- 
mogeneous properties. 

In §121 we describe the SDSS data that we use. In §|31 
we describe the mock catalogs that we use to optimize 
our group-finder and to estimate uncertainties for our 
measured group statistics. In §2|we outline our group- 
finding algorithm and choice of parameters. We present 
a detailed discussion of tests with mock catalogs in the 
Appendix, with the key points summarized in the main 
text. We discuss incompleteness in our group catalogs 
due to fiber collisions and survey edges in § The group 
catalogs are published in electronic tables and their con- 
tents are described in § Finally, in § we present our 
measured group multiplicity function. We will use this 
to constrain the HOD in future work. We summarize our 
results in §|H1 

2. DATA 
2.1. SDSS 

The SDSS is a large imaging and spectroscopic sur- 
vey that is mapping two-fifths of the Northern Galactic 
sky and a smaller area of the Sout hern Galactic sky, us- 
ing a dedicated 2.5 meter telescope l|Gunn et alJl200fTD at 
Apache Poi nt, New Mexico. T he survey uses a photomet- 
ric camera ()Gun n et al."1998D to scan the sky simulta- 
neously in five photom etric bandpasses ( Fukugita et aO 
Il996t iSmith et alJi2003) down to a limiting r-band mag- 
nitude of ~ 22.5. The imaging data are processed 
by automatic software that does astrometry ()Pier et alJ 

f03^, source identification, deblending and photometry 
UDton et al. 2001; Luoton 2005), photometric calibra- 
tion ( Hogg et al.ll2001t ISmith et al. 2002: Tucker 2005), 
and data quality assessment lITvezic et al 2004') . Al- 
gorithms are applied to select spectroscopic targets for 
the main galaxy sample (^Strauss et alJl2 002l). the lumi- 
nous red galaxy sample ( EisensteT n|etanl2001l) , and the 
quasar sample (Richards et al. 2002). The main galaxy 
sample is approximately complete down to an apparent 
r-band Petrosian magnitude limit of < 17.77. Targets 
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Table 1. Volume- limited Sample Parameters 



Name 




•2m ax 


< A/o.v 




ng 


Mr20 
Mr 19 
Mrl8 


0.015 
0.015 
0.015 


0.100 
0.068 
0.045 


-19.9 
-19.0 
-18.0 


57138 
37820 
18895 


0.00673 
0.01396 
0.02434 



Note — Absolute magnitude thresholds listed are for 

^max • '^g IS lU 

units of h^Mpc"^. 

are assigned to spectroscopic plates using an adaptive 
tiling algorithm ijBlanton et alJl2003c^) . Finally, spectro- 
scopic data reduction pipelines produce galaxy spectra 
and redshifts. 

We use the large-scale structure sample sample 14 
from the NYU Value Added Galaxy Catalog (NYU- 
VAGC: lBla^ton et al.ll2005l) as our primary galaxy sam- 
ple. Galaxy magnitudes are corrected for Galactic ex- 
tinction (jSchlcgcl et al. 1998) and absolute magnitudes 
are k-corrected (Blanton ct al. 2003a) and corrected for 
passive evolution (Blanton et al. 2003b) to rest-frame 
magnitudes at redshift 2; = 0.1. A significant fraction 
of the sample that we use was made publicly availabl e 
with the SDSS Data Release 3 l|Abazaiian et alJl2(K)5al) . 

The galaxy redshift sample has an incompleteness due 
to the mechanical restriction that spectroscopic fibers 
cannot be placed closer to each other than their own 
thickness. This fiber collision constraint makes it impos- 
sible to obtain redshifts for both galaxies in pairs that 
are closer than 55" on the sky. In the case of a conflict, 
the target selection algorithm randomly chooses which 
galaxy gets a fiber (Strauss ct al. 200^.^^ Spectroscopic 
plate overlaps alleviate this problem to some extent, but 
fiber collisions still account for a ~ 6% incompleteness 
in the main galaxy sample. Since this incompleteness is 
most severe in regions of high galaxy density, it is neces- 
sary to correct for it in studies of groups and clusters. We 
correct for fiber collisions by giving each collided galaxy 
the redshift of its nearest neighbor on the sky (usually 
the galaxy it collided with), and we show in §|Slthat this 
procedure is adequate for our purposes. Putting collided 
galaxies at the redshifts of their nearest neighbors will 
cause some nearby galaxies to be placed at high redshift, 
artificially making their estimated luminosities very high. 
Since the abundance of highly luminous galaxies is low, 
this contamination can become a significant fraction of 
all highly luminous galaxies. For this reason, we also 
give collided galaxies the magnitudes (in addition to the 
redshifts) of their nearest neighbors. The resulting lumi- 
nosity distribution is thus unbiased. 

There is some additional incompleteness due to bright 
foreground stars blocking background galaxies, but this 
is at the ~ 1% level. In order to limit the effects of 
incompleteness on our group identification, we restrict 
our sample to regions of the sky where the completeness 
(ratio of obtained redshifts to spectroscopic targets) is 
greater than 90%. Our final sample covers 3495.1 square 
degrees on the sky and contains 298729 galaxies. 

2.2. Volume-limited Samples 

In cases where a target galaxy fiber collides with a target 
quasar fiber, priority is always given to the quasar, but such colli- 
sions only constitute ~ 5% of all cases. 




z 

Fig. 1. — Absolute r-band magnitude vs. redshift for galax- 
ies in the SDSS redshift survey, highlighting the three volume- 
limited samples used for group identification. The three samples 
contain galaxies in the redshift ranges 0.015 — 0.1, 0.015 — 0.068, and 

0. 015 — 0.045 and are complete for galaxies with r-band absolute 
magnitudes brighter than —19.9, —19, and —18, correspondingly. 
The absolute magnitude threshold for a given volume-limited sam- 
ple evolves with redshift in order to account for passive luminosity 
evolution of the galaxy population. 

In this and subsequent papers, we are primarily inter- 
ested in using galaxy groups to constrain the properties 
of galaxies as a function of their underlying dark matter 
halo mass. It is therefore important that the popula- 
tion of galaxies constituting the groups is homogeneous 
within the sample volume. For this reason, we construct 
volume- limited subsamples of the full SDSS redshift sam- 
ple that are each complete in a specified redshift range 
down to a limiting r-band absolute magnitude thresh- 
old. We construct each sample by choosing redshift limits 
Zmin and Zmax, and only keeping galaxies whose evolved, 
redshifted spectra would still make the redshift survey's 
apparent magnitude and surface brightness cuts at the 
limiting redshifts of the sample. Since the apparent mag- 
nitude limit of the redshift sample varied across the sky 
in the commissioning phases of the survey, we cut the 
r-band magnitude limit from ~ 17.77 back to 17.5. This 
more conservative limit is uniform across the sky. 

We construct three such volume-limited samples. Fig- 
ure n shows these samples in the luminosity-redshift 
plane. Each dot in the figure shows a galaxy in the 
SDSS redshift survey. The sharp cutoff curve along the 
lower-right part of the plot shows our r — 17.5 appar- 
ent magnitude limit. We select three redshift ranges for 
our volume-limited samples: 0.015 — 0.1, 0.015 — 0.068, 
and 0.015 — 0.045. These samples are complete down 
to absolute r-band magnitudes of Mo v < —19.9, —19, 
and —18, respectively.^* We refer to these samples as 
Mr20, Mrl9, and A/rl8, henceforth. Regions of the 
plot that make it into these three samples are shown in 
blue, green, and red, respectively. The limiting absolute 
magnitude of each sample changes slightly with redshift 

All absolute magnitudes are quoted for Qm = 0.3, Q/^ = 0.7, 
and a value of the Hubble constant h = //o/100kms~^ Mpc~^ = 

1 . For other values of Ho , one should add 5 log h to the quoted 
absolute magnitudes. 
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Fig. 2. — Hammer (equal area) projection (in equatorial coordinates) of the SDSS volume-limited sample that goes out to redshift 0.1. 
Points represent galaxies in the sample. The solid curve shows the location of the Galactic plane. 



due to the passive evolution corrections applied to galaxy 
luminosities: as a galaxy is moved to the outer edge of 
a given volume-limited sample, its luminosity increases 
somewhat, allowing lower redshift galaxies to make it 
into the sample at lower luminosities than they do at 
higher redshifts. We choose the first limiting redshift 
of ^max = 0.1 because this yields the largest possible 
volume-limited sample (largest number of galaxies) . We 
choose lower redshift samples in order to probe galaxy 
populations less luminous than i*. We use a lower red- 
shift limit of 0.015 for all three samples to alleviate some 
of the problems associated with obtaining accurate pho- 
tometry of nearby highly extended galaxies. The redshift 
limits, luminosity thresholds at Zmax, number of galaxies, 
and space densities of these samples are listed in Table 1. 

Figure m shows a Hammer (equal area) projection (in 
equatorial coordinates) of sample Mr20. Points repre- 
sent galaxies in the sample. The curve shows the lo- 
cation of the Galactic plane. The figure illustrates the 
patchy and non-uniform nature of the sample footprint 
on the sky, which has irregular edges, as well as multi- 
ple holes. This irregularity exacerbates systematic errors 
due to edge effects. We deal with incompleteness due to 
edge effects in §0 

Figure \7\ shows an equatorial slice through sample 
Mr20. The slice is 4° thick and each point shows the 
RA and redshift of a galaxy in the sample. Prominent in 
this projection of the data is the the giant supercluster 
at z ~ 0.08 at the left end of the Sloan Great Wall of 
Galaxies, which extends from longitude 132 degrees (at 
z ~ 0.05) to lon gftude 210 degrees (at z ~ 0.08) (See 
IGott et al.ll2n05l) . 

3. MOCK CATALOGS 



Our main scientific motivation for constructing group 
catalogs from the SDSS data requires that identified 
groups most closely resemble systems of galaxies that 
occupy a common dark matter halo. Moreover, it is im- 
portant that we statistically quantify the degree to which 
our groups do not satisfy this criterion. For both these 
reasons, it is imperative that we use mock galaxy cat- 
alogs that are constructed by populating dark matter 
halos in N-body simulations with mock galaxies. The N- 
body simulations must satisfy two basic conditions: they 
must contain a large enough volume to fit our largest 
volume-limited sample, Mr20, and they must resolve the 
smallest mass halos that can host a galaxy in our least 
luminous volume-limited sample, Mrl8. HOD fits to the 
SDSS two-point correlation function of galaxies suggest 
that the minimum dark matter halo mass that can host 
a galaxy of lum inosity Mo.y ~ —18 is approxi mately 
2 X 10"/i-iMq llZehavi et al1l2005t iTinker et alJl2005,) . 
Requiring that a halo contain at least forty dark mat- 
ter particles to be resolved means that we need N-body 
simulations with particle masses less than 5 x IO^H'^Mq. 

We use a series of N-body simulations of a ACDM cos- 
mological model, with flm = 0.3, flA = 0.7, fib = 0.04, 
h = -ffo/(100 km s~^ Mpc~^) = 0.7, n, = 1.0, and as = 
0.9. This model is in good agreement with a wide vari- 
ety of cosmological observations (see, e.g. , Sr)cracl et al] 
l2003t iTeermark et al.ll2?)0l lAbazaiian et a l. 2005b). Ini- 
tial conditions were set up using the transfer function 
calculated for this cosmolo gical model by CMBFAST 
(jSeliak & Zaldarriagal llQQG*). The simulations were run 
at Los Alamos National Laboratory (LANL) using the 
Hashed-Oct-Tree (HOT) code (jWarren fc Salmon 1993). 
We use a total of six independent simulations of varying 
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size and resolution, which we refer to as LANLl-6. The 
size of box Lbox, number of particles Np, and resulting 
particle mass irip for each simulation are listed in Table 2. 
The gravitational force softening is Cgrav = 12/i~^kpc 
(Plummer equivalent). 

We identify halos in the dark matter particle distribu- 
tions using a friends-of-friends algorithm with a linking 
length equal to 0.2 times the mean interparticle separa- 
tion. We then populate these halos with galaxies using 
a simple model for the HOD of galaxies more luminous 
than a luminosity threshold. Every halo with a mass 
M greater than a minimum mass Mmin gets a central 
galaxy that is placed at the halo center of mass and is 
given the mean halo velocity. A number of satellite galax- 
ies is then drawn from a Poisson distribution with mean 
(A^sat) = m - M^in)/Mi)", for M > M^^. These 
satellite galaxies are assigned the positions and veloci- 
ties of randomly selected dark matter particles within 
the halo. In order to construct mock catalogs for each 
of our three volume-limited samples Mr20, Mrl9, and 
Mrl8, we select sets of values for t he parameters Mm\ry 
Ml, and a that yield the observed iZehavi et alJ l|2005fl 
galaxy-galaxy correlation functions for these samples. 
These HOD para meter values are sim ilar to the best- 
fit values given bv IZehavi et all lj2005(l (they are shghtly 
different because the model for (A^sat) was different in 
that paper). We refer to these sets of mock catalogs 
with the suffixes .Mr20, .Mr 19, and .Mrl8. In addition 
to these mock catalogs, we construct a set of catalogs 
for the Mr2Q sample using an alternative HOD model, 
where the mean number of satellites in a halo of mass 
M is (A^sat) = exp[-Meut/lM__Mmin)UM/Mi)", for 
M > Mniin (also used bv iTinker et all l2005|) . We fix 
the value of the slope a to 0.9, which is lower than 
that for the .Mr20 mocks, and we choose values for 
the remainin g HOD parameters that yield the observed 
IZehavi et al.l 1(20051 correlation function of Mo.v < ~20 
galaxies. We refer to these sets of mock catalogs with 
the sufRx .Mr20b. The values for all mock HOD param- 
eters are listed in Table 2. We construct ten realizations 
of each mock catalog listed in Table 2 by using differ- 
ent random number generator seeds when we (a) draw a 
number of satellite galaxies for each halo from a Poisson 
distribution of mean (A'sat), and (b) select random dark 
matter halo particles to give their positions and velocities 
to these satellite galaxies. The dispersion among the ten 
realizations for one mock catalog therefore represents the 
scatter among possible observed states for a given halo 
distribution and HOD model. 

We now have a set of mock catalogs containing galaxies 
in real space and in the cubical geometry of the N-body 
simulations. We refer to these as our "real-space cube 
mocks" . We create a redshift-space version of these cat- 
alogs by assuming the distant observer approximation 
and aligning the line-of-sight along one of the axes of the 
simulation cubes. We use the mock galaxies' peculiar ve- 
locities to move them along the line-of-sight into redshift 
space. We refer to the resulting mock catalogs as our 
"redshift-space cube mocks". We use these real-space 
and redshift-space cube mocks to determine optimal pa- 
rameters for our group-finding algorithm. We summarize 
this determination in 21 a-^d discuss details in the Ap- 
pendix. 



For the purpose of studying the effects of SDSS incom- 
pleteness on our measured groups, as well as for obtaining 
estimates of the uncertainty in our measured group mul- 
tiplicity function, we also require mock catalogs that have 
the same geometry as our SDSS volume-limited samples. 
The total volume of our largest sample, Mr20, is ap- 
proximately 210'^/i~^Mpc'^, which is more than six times 
smaller than any of our mock cubes. However, the SDSS 
geometry is highly irregular (as seen in Fig. 13) and can 
only be fully embedded in a cube of much larger volume 
than the survey itself. The A/r20 sample, for example, 
has a maximum extent of ~ 600/i~^Mpc when both the 
North and South Galactic portions are included. In order 
to carve this sample geometry out of our mock catalogs, 
we create mock cubes with eight times larger volume by 
tiling each mock cube 2x2x2. Since the N-body simula- 
tions used to construct the mocks were run with periodic 
boundary conditions, we can tile the cubes without hav- 
ing density discontinuities at the boundaries. We set the 
center of this tiled cube to be the origin and put galaxies 
into redshift space using the line-of-sight component of 
their peculiar velocities. We then compute RA, DEC, 
and redshift coordinates for every mock galaxy in the 
tiled cube. Finally, we only keep galaxies whose coor- 
dinates on the sky would place them in regions of the 
SDSS survey that have completeness greater than 90%, 
and whose redshifts lie within the redshift limits of the 
specific volume-limited sample we are constructing mock 
catalogs for. 

Since the volume of each simulation cube is at least 
six times larger than our largest volume-limited sample 
Mr20, we try to carve out as many independent volumes 
with the Mr20 geometry as possible without too much 
overlap. We do this by performing many sets of three 
rotations (one around each Cartesian axis) and testing 
how much overlap the resulting catalogs have with each 
other (i.e., how many common mock galaxies do they 
share). With the right combination of rotation angles, 
we can carve out two Mr20 mock catalogs that share 
fewer than 3% of their galaxies with each other, but we 
cannot obtain more without significant overlap. We cre- 
ate two such independent mock catalogs, with the cor- 
rect SDSS geometry, from every one of the ten HOD 
realizations of the mock cubes listed in Table 2, except 
for the LANL6.Mr20 mock. This procedure yields 200 
mock catalogs for the Mr20 sample (5 N-body simu- 
lations X 2 HOD models x 10 HOD reahzations x 2 
mocks per simulation cube), and 80 mock catalogs each 
for the Mr 19 and Mr 18 samples (4 N-body simulations 
X 1 HOD model x 10 HOD realizations x 2 mocks per 
simulation cube). 

The final step in creating mock SDSS catalogs is to in- 
corporate the fiber collision constraint. We use a friends- 
of-friends algorithm to identify groups of mock galaxies 
that are linked together by the 55" minimum angular sep- 
aration of fibers. We then select "collided" mock galaxies 
(whose redshifts will be unknown) in each such collision 
group in a way that minimizes the number of such galax- 
ies. For example, if a collision group contains three galax- 
ies in a row, where the first is closer than 55" from the 
second and the second is closer than 55" from the third, 
but the first is more than 55" from the third, we will 
always select the middle galaxy to be the collided one. 
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Table 2. Mock Catalog Parameters 
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In cases where multiple choices yield the same number 
of collided galaxies, we select randomly (e.g., in collision 
groups with only two galaxies). This procedure is de- 
signed to mimic the tiling code that assigns spectrosco pic 
fibers to SDSS target galaxies f Blanton et"ani2003cj) . If 
we perform this operation on the . Mr20 catalogs we end 
up with only ^ 3% of mock galaxies being tagged as col- 
lided. This is about half the fraction of SDSS galaxies 
in our Mr20 sample that don't have measured redshifts 
due to fiber collisions. The reason for this discrepancy 
is that galaxies in the Mr20 volume-limited sample do 
not only collide with each other; they also collide with 
galaxies more luminous than Mo v ~20 at redshifts 
higher than the sample limit z = 0.1 and galaxies less 
luminous than Mo.v ~ —20 at lower redshifts. Most of 
these additional galaxies that can collide with a given 
galaxy in Mr20 are uncorrelated background or fore- 
ground galaxies. It is therefore sufficient to model them 
as a background screen of galaxies on the sky that have 
an angular correlation function equal to the mean for all 
SDSS galaxies. For this purpose, we use the very large 
volume LANL6.Mr20 cube mock. We use LANL6.Mr20 to 
construct a "screen" catalog with the correct SDSS an- 
gular geometry and a variable outer redshift limit, and 
superpose it onto each of our .Mr20, .Mr 19, and .Mr 18 
mock catalogs. We then allow all galaxies to collide with 
each other and keep track of collided mock galaxies. We 
set the outer redshift limit of the screen catalog to the 
value that results in ~ 6% of mock galaxies being tagged 
as collided. We find that we need approximately seven 
times more galaxies in the screen catalog than in the 
mocks in order to achieve this collided fraction. 

Using this approach we construct three versions of ev- 
ery mock catalog described above: a version with no fiber 
collisions applied ("true" version), a version where col- 
lided galaxies have no redshifts and are dropped out of 
the mock catalog altogether ( "uncorrected" version) , and 
a version where collided galaxies are assigned the redshift 
of the galaxy they collided with ("corrected" version). 
These mock catalogs allow us to test the effects of fiber 



collisions on our measured group multiplicity function 
(discussed in §0) 

4. GROUP-FINDING ALGORITHM 

We wish to identify galaxy groups primarily in order 
to measure the group multiplicity function and use it to 
constrain the HOD of galaxies as a function of galaxy 
properties. This goal places a number of demands on 
the group- finding algorithm: (1) It should identify galaxy 
systems that occupy the same dark matter halos with the 
least possible merging of different halos into the same 
group and the least possible splitting of individual halos 
into multiple groups. (2) It should produce a group mul- 
tiplicity function that is unbiased with respect to the halo 
multiplicity function. (3) It should be simple and well- 
defined so that the statistical and systematic uncertainty 
in the measured group multiplicity function can be ac- 
curately characterized. (4) It should use only the spatial 
positions of galaxies in redshift space to identify groups, 
and not galaxy properties such as color or luminosity. 
These requirements point to an algorithm that uniquely 
identifies density enhancements in redshift space. 

We adopt the simple and well understood fricnds-of- 
friends (FoF) algorithm, where galaxies are recursively 
linked to other galaxies within a specified linking volume 
around each galaxy. The FoF algorithm has several at- 
tractive features. First, for a given linking volume (usu- 
ally specified by one linking length in real space and two 
linking lengths in redshift space) , FoF produces a unique 
group catalog. Second, it does not assume or enforce 
any particular geometry for groups (e.g., spherical), but 
rather identifies structures that are approximately en- 
closed by an isodensity surface whose density is mono- 
tonically related to the linking lengths. Third, the algo- 
rithm satisfies a nesting condition: all the members of a 
group identified with one set of linking lengths are also 
members of the same group identified using larger linking 
lengths. 

The FoF algorithm has been used extensively to 
identify dark mat ter halos in N-body simulations (e.g., 
iDavis et al.l Il985j) and has been shown to produce 
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halo catalogs with mass functions that are close to 
universal (within ^ 20%) for a wide ra.nge of epochs 
and cosmological models ijJenkins et alJ 120011) . FoF 
has also been the most used algorith ni for identifying 
galaxy groups in rcdshift surveys fHuchra & GcUcr 
1981 lOeller fc HTchra 198,^ .Nolthenius White 1987: 
Ramella et al. 19 83: iMoore et al.l 119931: iRamella et al l 
1997L I1999t iTucker et alJ l2000t iGi uricin et al.' "2000t 
Ramella et all I2002t IMerchan fc" Zandivarcz 200| 
Eke^r^Ll |2004|) . though alternative methods have also 
been used (see e . g.. ITuUv ,1987: Marinoni et ah ,2002; 
Ifierke et a1.ll2f)0fll lYang et a1.ll2'o'o,'^ . These FoF studies 
all used the same basic algorithm, but differed in their 
choices for linking lengths and in their methods for 
dealing with the varying density of galaxies inherent in 

flux-limited surveys.^ 

We use the basic iHuchra fc Gelled 1)19821) algorithm, 
where two galaxies are linked to each other if both their 
transverse and line-of-sight separations are smaller than 
a given pair of projected and line-of-sight linking lengths, 
respectively. Specifically, two galaxies i and j with an- 
gular separation 9ij and redshifts Zi and Zj, have a pro- 
jected separation D±^ij and a line-of-sight separation 
(both in /i^^Mpc) given by 

D^,,j = {c/Ha){z, + Zj) sin(%/2), (1) 
Duj = {c/Ha)\z,-Zj\. (2) 

The two galaxies arc then linked to each other if 

< fig (3) 

and 

Du, < h S'"^ (4) 

where is the mean number density of galaxies, and h± 
and 6|| are the projected and line-of-sight linking lengths 
in units of the mean intergalaxy separation. Since we 
use volume-limited samples of SDSS galaxies, fig is con- 
stant throughout the sample volumes, and thus the link- 
ing lengths are also constant. 

The resulting linking volume around each galaxy is 
very similar to a cylinder, oriented along the line-of-sight, 
whose radius is equal to the projected linking length and 
whose height is equal to twice the line-of-sight linking 
length. It is not a perfect cylinder because its radius 
increases with redshift, making it slightly wider at the 
far end than at the near end, and its bases are slightly 
curved. However, for the small linking lengths considered 
here, a cylinder is a good approximation. The FoF algo- 
rithm works recursively, whereby a galaxy is linked to all 
its "friends" , which are in turn linked to their "friends" , 
etc., to yield a unique group of galaxies. 

4.1. Choice of Linking Lengths 

The most important ingredient of our group-finding al- 
gorithm is our choice for the linking lengths b± and If 
the linking lengths are too small, then the group-finder 
will break up single halos into multiple groups. If the 
linking lengths are too large, then different halos will be 
fused together into single groups. There are no values 

We use these simple equations, rather than the exact formulae 
for the redshift-distance and angular diameter-distance relations 
because, at z = 0.1 (the outer limit of our sample), the difference 
between these formulae is less than 1%. 



for the linking lengths that will work perfectly for every 
halo, even in real space. In redshift space this problem 
becomes substantially worse, since redshift-space distor- 
tions both move halos and elongate them along the line- 
of-sight, often causing them to overlap with each other. 
The right choice of linking lengths depends on the pur- 
pose for which groups are being identified. If we require a 
group catalog that is highly inclusive and groups together 
every galaxy inhabiting the same halo, then we will use 
larger linking lengths than if our goal is to minimize con- 
tamination by galaxies that come from different halos. 
For our purposes, we wish to obtain a balance between 
being inclusive and reducing contamination, while pro- 
ducing groups that have an unbiased multiplicity func- 
tion. 

In order to find the right combination of linking 
lengths, we use the mock galaxy catalogs described in 
§ 121 Specifically, we use the real- and redshift-space 
cube mocks, which are constructed by applying simple 
HOD models to the LANLl and LANL4 N-body simula- 
tions. Since we know which mock galaxies occupy the 
same dark matter halos, we can evaluate how well a par- 
ticular choice of linking lengths recovers features of the 
halo population. The mocks that we use here have a 
cubical geometry, and we assume the distant observer 
approximation when we put mock galaxies into redshift 
space. We use the full cubical mocks rather than those 
with the correct SDSS geometry because the full mocks 
have a much larger volume and thus better statistics. 
Moreover, our goal is to find the best linking lengths for 
any redshift survey, and we will deal with systematic ef- 
fects specific to our SDSS sample geometry separately. 
The FoF algorithm that we use is therefore slightly dif- 
ferent from the one outlined above, in that the linking 
volume is a perfect cylinder (i.e., D±^ij is simply the pro- 
jected distance between two mock galaxies). 

We run the FoF group-finder on the mock catalogs for 
a grid of linking length values, and we study the prop- 
erties of the resulting group catalogs. Specifically, we 
investigate four features of the recovered group distri- 
bution: (1) the group multiplicity function compared 
to the "true" halo multiplicity function; (2) The rela- 
tion between the number of galaxies in a halo Ntmc and 
the number of galaxies in its associated group iVobs; (3) 
The distribution of projected group sizes as a function 
of group richness compared to the "true" distribution of 
projected halo sizes as a function of halo multiplicity; 
(4) The distribution of group velocity dispersions as a 
function of group richness compared to the "true" distri- 
bution of halo velocity dispersions as a function of halo 
multiplicity. 

We check how each set of linking lengths performs in 
the above four tests, for each of the four HOD model 
mock cubes (.Mr20, .Mr20b, .Mrl9, .Mrl8). In the 
case of each HOD model, we average results over the 
10 HOD realizations described in §|21and over the LANLl 
and LANL4 N-body simulations. We do this procedure for 
groups that are identified in both real space (for which 
there is only one linking length), and redshift space. 
These tests are described in detail in the Appendix. Here 
we summarize the main results. 

In real space, a linking length choice of & = 0.2 yields 
galaxy groups with ten or more members that pass all 
four tests listed above. Groups with < 10 show sys- 
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tematic deviations in abundance, multiplicity, projected 
sizes, and velocity dispersions from the corresponding ha- 
los with N < 10. The choice of 6 = 0.2 is not surprising, 
given that the same linking length was used to identify 
halos in the N-body simulations. It is also not surprising 
that the group-finding fails the tests for small groups, 
where adding or losing a couple of galaxies makes a large 
fractional difference to the group size. The threshold of 
^ 10 is independent of the underlying dark matter 
halo mass. This means that we can push the regime 
in which the groups are reliable to lower mass systems 
by using a lower luminosity sample (where each halo will 
contain more galaxies). Of course, the change of luminos- 
ity threshold comes at the expense of statistical power, 
since low luminosity samples have smaller volumes than 
high luminosity samples. The number of groups in a 
volume-limited sample scales roughly with the number 
of galaxies, and a luminosity threshold near the charac- 
teristic luminosity maximizes this number. 

In redshift space the situation is more complicated. No 
set of transverse and line-of-sight linking lengths is able 
to produce groups that pass all four tests listed above, 
even for large size groups. Figure |31 summarizes our tests 
for the . Mr20 HOD model mocks. Results for the other 
HOD models are similar and are shown in the Appendix. 
The figure shows regions (shaded) of the two-dimensional 
linking length space (fe|| vs. b±) that pass each of our four 
tests. 

4.1.1. Multiplicity Function 

The dark and thin shaded region in Figure El labeled 
n(N), shows linking lengths that pass the group multi- 
plicity function test. In other words, these linking lengths 
yield mock group catalogs whose multiplicity functions 
are unbiased relative to the "true" input halo multiplic- 
ity function, in the regime N > 10. In this case, "unbi- 
ased" means that the shape of the multiplicity function 
is on average the same as the "true" shape and its am- 
plitude is within 10% of the "true" amplitude. Linking 
length values that lie along the upper boundary of the 
shaded region (e.g, the values b± = 0.11, 6|| = 1.5) yield 
multiplicity functions that are 10% too high in ampli- 
tude, whereas values that lie along the lower boundary 
yield multiplicity functions whose amplitudes are 10% 
too low. These results show that an increase in either 
linking length generally leads to an increase in the mul- 
tiplicity function for N > 10. This increase is compen- 
sated for by a corresponding decrease in the abundance 
of isolated (i.e., = 1) and low N groups. The shaded 
region appears to be close to horizontal only because the 
vertical axis is highly compressed with respect to the 
horizontal axis. 

4.1.2. Nt, 

UC 

vs. Nobs 

The group multiplicity function is an average statistic 
showing the abundance of all groups as a function of N. 
It is therefore possible, in principle, for it to be unbi- 
ased relative to the halo multiplicity function, without 
the relation between individual halo multiplicities and 
their recovered group multiplicities being correct. For 
this reason, we also require that the group-finder yield 
an unbiased relation between the multiplicity of individ- 
ual halos, iVtruc, and their recovered groups, A'obs- In 
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Fig. 3. — Regions of the FoF linking length parameter space 
that do well in recovering galaxy groups that have similar proper- 
ties to their parent halos. Each shaded region shows the combi- 
nation of perpendicular and line-of-sight FoF linking lengths that 
are successful in recovering a particular feature of the group dis- 
tribution, measured using mock galaxy catalogs. The four features 
are: (a) the group multiplicity function {black region); (b) the rela- 
tion between halo and group richness for halos and groups that are 
matched one-to-one (green region); (c) the projected sizes of groups 
as a function of group richness {blue region); (d) the line-of-sight 
velocity dispersion of groups as a function of group richness {red 
region) . The yellow star denotes the FoF parameters that we apply 
to identify groups in the SDSS. 

order to check this, we must match input halos to recov- 
ered groups in a one-to-one way. There are many ways 
to do this matching, and no one way is more correct than 
another. For example, a halo can be associated with the 
group that contains most of its galaxies, or the group that 
contains its central galaxy, or the group whose centroid 
is closest to the halo center. We associate each halo to 
the group that contains its central galaxy. When two or 
more halos are matched to the same group, we choose the 
halo that shares the largest number of common galaxies 
with the group. Halos that are not associated with any 
group are considered "undetected," and groups that are 
not associated with any halo (because they don't contain 
any halo central galaxies) are considered "spurious" . 

The light (and green) shaded region in Figure 01 that 
roughly tracks and is slightly wider than the n{N) re- 
gion shows linking lengths that pass the iVtruo vs. A^obs 
test. In other words, these linking lengths yield mock 
group catalogs with an unbiased median relation between 
iVtruo and A'obs for associated halos and groups, in the 
regime A^ > 10. We consider the relation to be unbi- 
ased if its slope is within 10% of unity. Linking length 
values that lie along the upper boundary of the shaded 
region yield associated halos and groups with a median 
relation A'true = 1-1 A'obs > whereas values that lie along 
the lower boundary yield the relation A'true = O.OA'obs- 
As expected, most linking lengths that pass the multi- 
plicity function test also pass the A'true vs. A'obs test. 
This breaks down, however, for values of b± greater than 
0.16-0.17. 

4.1.3. Projected Sizes 



9 



The (blue) shaded region in Figure O labeled "Pro- 
jected sizes", shows linking lengths that pass the pro- 
jected sizes test. These linking lengths yield mock groups 
with an unbiased median relation between rms projected 
size and group multiplicity N, in the regime > 10. We 
consider the relation to be unbiased if it is within 10% of 
the "true" relation between median rms projected halo 
size and halo multiplicity. This shaded region is roughly 
vertically oriented because the projected linking length 
6x affects the projected sizes of groups much more than 
the line-of-sight linking length Clearly, increasing 
b± leads to galaxy groups with larger projected sizes. 
The shaded region is not completely vertical, however, 
because increasing &|| also leads to larger projected size 
groups, albeit in a much less sensitive way. 

4.1.4. Velocity Dispersions 

The (red) shaded region in Figure labeled "Velocity 
dispersions" , shows linking lengths that pass the velocity 
dispersion test. These linking lengths yield mock groups 
with an unbiased median relation between group veloc- 
ity dispersion and group multiplicity iV, in the regime 
N > 10. We consider the relation to be unbiased if 
it is within 10% of the "true" relation between me- 
dian halo velocity dispersion and halo multiplicity. This 
shaded region is roughly horizontally oriented because 
the linc-of-sight linking length 6|| affects the velocity dis- 
persions of groups much more than b±. Clearly, increas- 
ing 6|| leads to galaxy groups with larger velocity disper- 
sions. The shaded region is not completely horizontal, 
because changing b± also affects the velocity dispersions 
of groups, though not consistently in the same sense. 

4.1.5. Our Adopted Linking Lengths 

It is obvious from Figure|21that no combination of FoF 
linking lengths passes all four tests listed above. We can 
choose linking lengths that successfully recover the abun- 
dance and projected sizes, or the abundance and velocity 
dispersions of groups as a function of multiplicity, but 
not all three simultaneously. We can also choose linking 
lengths that successfully recover both the projected sizes 
and velocity dispersions of groups as a function of multi- 
plicity, but since the multiplicity function of such groups 
is incorrect, the overall size and velocity dispersion dis- 
tributions will also be incorrect. This failure to recover 
all features of groups in redshift space is a fundamental 
shortcoming of the FoF group-finder when applied to red- 
shift space. Given that most redshift-space group-finding 
algorithms operate on very similar principles, i.e., they 
identify overdense regions that are elongated along the 
line-of-sight, it is likely that this shortcoming is shared 
by other group-finders as well. To our knowledge, no 
group-finder has been shown to pass all four of the tests 
considered here for a single choice of parameters. 

Figure |3| shows that in order to recover groups with 
unbiased velocity dispersions, the line-of-sight linking 
length must be substantially larger than the mean inter- 
galaxy separation. With 6|| that large, groups are bound 
to be linked together along the line-of-sight. The only 
way to then obtain groups with the correct multiplic- 
ity function is to have a transverse linking length small 
enough that galaxies in the outer parts of halos are not 
included in the recovered groups. The resulting groups 



bear little physical resemblance to their parent halos. If, 
on the other hand, we recover groups with unbiased pro- 
jected sizes, then the groups will be missing some of their 
fastest moving galaxies and this decrease in multiplicity 
will be compensated by including as group members a 
few galaxies in the infall regions of halos. These groups 
are much more physically similar to their parent halos. 
For this reason, we choose to sacrifice velocity disper- 
sions, rather than projected sizes, when selecting values 
for the FoF linking lengths. 

Figure O shows the linking length values that we adopt 
and use in this paper (yellow star). These values are 

b±_ = 0.14, 6|| = 0.75 . (5) 

Our mock catalog tests show that the FoF algorithm with 
these linking lengths finds galaxy groups with > 10 
that have: (1) an unbiased multiplicity function; (2) an 
unbiased median relation between the multiplicities of 
groups and their associated halos; (3) a spurious group 
fraction of less than ~ 1%; (4) a halo completeness (frac- 
tion of halos that are associated one-to-one with groups) 
of more than ~ 97%; (5) the correct projected size dis- 
tribution as a function of multiplicity; (6) a velocity dis- 
persion distribution that is ~ 20% too low at all mul- 
tiplicities. These results hold for all of the mock cata- 
logs that we have used (see results for other HOD mod- 
els in the Appendix) and are thus not very sensitive to 
the HOD model assumed or to the specific realization of 
the underlying density field. We note that our adopted 
group-finder only has the above properties when dark 
matter halos are defined using a FoF algorithm with a 
linking length of 0.2 times the mean interparticle separa- 
tion, since that was the definition used to construct our 
mock catalogs. A different halo definition (such as FoF 
using a different linking length, or a spherical overdensity 
halo-finder) will result in a different optimal group- finder. 

Previous FoF group ana lyses have used different link- 
ing lengths. For example, lEke et aT] ((2004.) adopt b± = 
0.13, 5|| — 1.43 in their analysis of groups in the 2dF 
Galaxy Redshift Survey (2dFGRS; ' Colless et al'] l200l'). 
With a similar transverse linking length but much larger 
line-of-sight linking length than used here, this parame- 
ter combination yields unbiased projected sizes and ve- 
locity dispersions, but it overpredicts the abundances 
of halos by 20 — 30% at large multiplicities (see Fig- 
ure 13). These groups are thus poorly suited to our 
primary object ive of using group abu ndances as a cos- 
mological test. lYang et aLl l)2005|) and lWeinmann et alJ 
(^Qa) use a group-finder that assumes a mass, radius, 
and velocity dispersion for each preliminary group and 
then includes or discards galaxies from the group based 
on these assumed properties (similar to a matched fil- 
ter technique). This method might, in principle, be able 
to simultaneously recover groups with unbiased abun- 
dances, projected sizes, and velocity dispersions - at the 
expense of model independence - but this remains to be 
tested. 

5. INCOMPLETENESS 

There are two main sources of incompleteness that will 
affect the richnesses of groups, and hence the multiplic- 
ity function, in our SDSS group catalogs: fiber collisions 
and survey edges. Both these effects will prevent galax- 
ies from being included in some groups, and thus cause 
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Fig. 4. — Effect of fiber collisions on the group multiplicity 
function measured using mock SDSS galaxy catalogs, which are 
described in §|3] The top panel shows the differential group mul- 
tiplicity function for mock catalogs that contain no fiber collisions 
and thus represent the "true" case {solid black curve), that lose 
galaxies due to fiber collisions as in the SDSS survey (dotted blue 
curve), and that are corrected for fiber collisions as described in SiSl 
(dashed red curve). The bottom panel shows the ratio of each case 
to the "true" one. The shaded region encloses ±10% deviations 
from the "true" multiplicity function. These results are averaged 
over all of our .Mr 20 mock catalogs. 

the richness of these groups to be underestimated. These 
sources of incompleteness and their effects on the mea- 
sured group multiphcity function must be accounted for. 

5.1. Fiber Collisions 

Fiber colhsions cause an incompleteness that grows 
with the surface density of galaxies and is thus especially 
important in group and cluster studies. Moreover, the 
surface density in groups is likely a function of group rich- 
ness. The mean surface density of a group of richness N, 
mass M, and radius R scales like S - N/R'^ - N/Kp/^. 
For a power-law relation between mean richness and halo 
mass N ^ M", the surface density is S - iV^-^/^". This 
scaling relation is clearly a crude approximation, but it 
illustrates that the incompleteness due to fiber collisions 
likely varies with group richness and can thus affect both 
the amplitude and slope of the multiplicity function. 

We use the 100 LANLl-5.Mr20 mock catalogs (5 N- 
body simulations x 10 HOD realizations x 2 mocks per 
simulation cube) to assess the impact of fiber collisions 
on the group multiplicity function. We apply the group- 
finder described in § 01 to the "uncorrected" and "true" 
versions of these mock catalogs and measure the resulting 



multiplicity functions. Figure 0] shows these multiplicity 
functions averaged over all the mock catalogs. The figure 
shows that dropping collided galaxies from the sample 
lowers the amplitude of the multiplicity function by more 
than 10% and also slightly changes its slope. The ampli- 
tude drops because some groups in each richness bin lose 
galaxies and are thus shifted to lower N bins. There are 
also some groups from higher N bins that are shifted into 
these bins, but their number is smaller than the number 
of groups lost because the abundance of groups drops 
st eeply with in c reasin g N . 

IZehavi et alJ l|2005y) show that the effect of fiber col- 
lisions on the galaxy two-point correlation function can 
be successfully corrected for by including each collided 
galaxy at the redshift of its nearest neighbor. We apply 
the same correction to our mock catalogs to produce a 
set of "corrected" mocks. Figure 0] shows that this cor- 
rection works very well in the regime N > 10, and we 
therefore adopt it for our group identification. 

5.2. Survey Edges 

Groups that are identified near the edges of a given 
sample could be missing galaxies that are located just 
outside the sample. Similar to fiber collisions, edge ef- 
fects always shift groups from higher to lower richness. 
Moreover, large and extended groups have a higher prob- 
ability of being affected by edges than do small and com- 
pact groups because they can straddle an edge while be- 
ing further away from it. Edge effects are most severe 
when the ratio of a sample's surface area to its enclosed 
volume is high. Figure |21 shows that the SDSS sample 
has a highly irregular footprint on the sky, which implies 
a high surface-to-volume ratio. Edge effects are, there- 
fore, potentially severe in our samples. When the SDSS 
survey is complete and the gap in the North Galactic cap 
is filled in, edge effects will be much less important. 

We can measure the effects of edges using our mock cat- 
alogs, since we know what galaxies lie on the other side 
of edges. For every group identified in our LANLl-5 . Mr20 
mock catalogs, we determine how many galaxies are miss- 
ing due to edges. An edge can lie either in the perpendic- 
ular direction, or along the line-of-sight due to a sample's 
redshift limits. 

The solid curve in the right panel of Figure El shows 
the fraction of mock groups that are missing one or more 
galaxies due to edges, as a function of group richness 
N . The affected fraction climbs from 10% to 40% as 
N goes from 5 to 50. Edges clearly affect a large frac- 
tion of high richness groups in our sample, but counting 
a group as affected if it loses only a single galaxy is a 
very conservative test. It makes more sense to calculate 
the fraction of groups that lose a fixed fraction of their 
galaxies, rather than just a single galaxy. The dashed 
curve in the same panel shows the fraction of groups 
that lose 25% or more of their galaxies. The affected 
fraction defined this way is ^ 10%, roughly independent 
of richness. Figure El shows the effect of edges on the 
multiplicity function (blue curve). The effect of edges on 
the abundance of mock groups grows from zero at = 2 
to approximately 20% at iV = 50. It is, therefore, very 
important to correct for edges, since they systematically 
change the shape of the multiplicity function and, hence, 
the derived HOD. 

We measure the shortest distance of every galaxy from 
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Fig. 5. — Fraction of groups affected by survey edges, measured using mock SDSS galaxy catalogs (described in §|2j. Groups are 
considered affected by edges if they lose any galaxies that would have been included in the absence of edges. The panels show the edge 
fraction of groups in bins of the distance from their centroids to the closest edge rcdgc {'e/* punel) and group richness A'^ [right panel). The 
right panel also shows the fraction of groups that lose more than 25% of their member galaxies due to edges {dashed curve). These results 
are averaged over four independent .Mr20 mock catalogs. 



the survey edges by laying down points around each 
galaxy at successively larger radii and checking if they 
also lie within our sample volume. The smallest radius 
at which points fall outside the sample volume is the 
distance of the galaxy from the edge. Any group that 
contains at least one galaxy within a linking length from 
the edge, whether it is a projected linking length in the 
tangential direction or a line-of-sight linking length in 
the redshift direction, is potentially affected, since there 
could be galaxies on the other side that would be linked 
to the same group. One possible way to deal with edges 
is to throw out all such groups. This is a very conser- 
vative solution, since it ensures that all groups in our fi- 
nal sample are uncontaminated by edges. However, it is 
tricky to estimate the new effective volume of the sample, 
which is necessary for measuring the multiplicity func- 
tion. Moreover, the effective volume for large groups will 
be smaller than that for small groups. Another possibil- 
ity is to keep all groups, but somehow correct the mul- 
tiplicities of those that are potentially affected by edges. 
This solution has the advantage that no groups are lost, 
but it is once again difficult to estimate the effective vol- 
ume of the sample, even if all multiplicity corrections are 
exactly right. A third possibility is to reject all groups 
whose centers lie less than a minimum distance from the 
edge. This correction has the advantage that it produces 
an unbiased sample and it is simple to estimate the new 
effective volume. However, it is important to use the 
correct minimum distance. If it is too small, then the 
correction will not work for the largest groups; if it is too 
big, then we will unnecessarily reduce our sample size. 

The left panel of Figure |5l shows the fraction of mock 
groups that are missing one or more galaxies due to 
edg function of the distance from the group cen- 

troid to the edge. The fraction drops from 20% at 100 
Kpc to 5% at 500 Kpc and less than 1% at 1 Mpc. It 
does not go to zero at larger distance because there are 
groups with high velocity dispersion that can be far from 
the edge and still have galaxies within a linking length 
of the outer or lower redshift limit of our sample. This 
figure suggests that if we set the minimum distance to 



500 Kpc in the tangential direction and 500 km/s in the 
redshift direction, we should eliminate most groups that 
are affected by edges. We make this correction on our 
mock group catalogs, and the number of groups in the 
resulting catalog is reduced by ~ 22% on average. We 
estimate the new effective volume of each group catalog 
by scaling the original volume by the fraction of groups 
that survive the edge cut. This estimate, though not ex- 
actly accurate, is simple to make and adequate for our 
purposes. Figure |H| shows that this correction results 
in a multiplicity function that is unbiased due to edges 
(dashed red curve). 

Our mock catalog tests show that we can deal with 
survey edges effectively if we measure the multiplicity 
function after eliminating all groups whose centers (esti- 
mated as the centroids of their member galaxy positions) 
lie less than 500 Kpc from an edge in the tangential di- 
rection or less than 500 km/s from an edge in the radial 
direction. Applying this edge cut to the Mr20, Mrl9, 
and Mrl8 SDSS group catalogs reduces the numbers of 
groups by 22.0%, 30.2%, and 41.1%, respectively. Our 
measurement of the multiplicity function for these sam- 
ples includes this correction, though the group catalogs 
that we present include all groups. 

6. GROUP AND CLUSTER CATALOG 

We apply our group-finding algorithm to the three 
volume-limited samples described in § |5] and get three 
group catalogs. The fractions of ungrouped, isolated 
galaxies are 43.7%, 41.2%, and 39.8% for the Mr20, 
Mrl9, and Mrl8 samples, respectively. The fractions of 
galaxies grouped in pairs are 19.1%, 18.3%, and 17.9%. 
The remaining 37.2%, 40.6%, and 42.3% of galaxies are 
in groups of three or more members. Samples Mr 20, 
Mr 19, and Mr 18 contain a total of 4107, 2684, and 1357 
groups with richness iV > 3, respectively. 

Figure IHl shows an equatorial sHce with groups identi- 
fied from sample Mr 20. The slice is 4° thick and each 
point shows the RA and redshift of a group with > 3. 
A comparison of this figure to Figure|Z|shows that groups 
and clusters trace the large-scale structure of galaxies, 
as expected. Larger groups are preferentially located in 
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Fig. 6. — Effect of survey edges on the group multiplicity func- 
tion measured using mock SDSS galaxy catalogs (described in §|3l. 
The figure shows the group multiplicity function for mock catalogs 
that contain no edge effects and thus represent the "true" case 
{solid black curve), that contain edge effects as in the SDSS sur- 
vey (dotted blue curve), and that are corrected for edge effects as 
described in fjlSK dashed red curve). All other features as in Fig. HI 
These results are averaged over all of our .Mr 20 mock catalogs. 

higher density regions, whereas smaller groups are more 
uniformly distributed. It is striking that the majority 
of very large groups reside within the large supercluster 
at z = 0.08. Figure shows the same slice, but with 
points representing the positions of member galaxies in 
iV > 3 groups. A visual inspection of the figure shows 
that group velocity dispersions, which are responsible for 
the finger-of-God effect, are largest in the most luminous 
groups. 

For each group, we compute an unweighted group cen- 
troid, which consists of a group right ascension, decli- 
nation, and mean redshift. We compute a total group 
luminosity that is the sum of luminosities of its member 
galaxies. Since we are dealing with volume-limited sam- 
ples, the luminosity of a given group in samples Mr20, 
Mrl9, Mrl8, only counts galaxies with absolute mag- 
nitudes brighter than -19.9, -19, -18, respectively. For 
example, for the Mr20 sample, the total group absolute 
magnitude is 



Mr20 = -2. Slog 




-0.4Afo.: 



(6) 



and it is equivalent to integrating the galaxy luminosity 
function within the group from Mo.^, = —19.9 to — oo. 
Note that we compute these group absolute magnitudes 



using the altered absolute magnitudes for galaxies that 
do not have measured redshifts due to fiber collisions 
We also compute a total group color, which is 



(see 



simply defined as {g — r)2o = Mg2o — Mr2o- We compute 
a group one-dimensional velocity dispersion given by 



^ 1=1 

and an rms projected group radius given by 



R 



_L,i'ms 



\ 



1 



N 
i=l 



(7) 



(8) 



where is the projected distance between each member 
galaxy and the group centroid. 

In the three portions of Table 3, we present the groups 
and clusters with > 3, selected from samples Mr20, 
Mrl9, and Mrl8. For each group, we list a group ID 
(column 1); the (J2000) right ascension and declination 
of the group centroid (columns 2,3); the mean redshift of 
the cluster (column 4) ; the group richness N (column 5) ; 
the total r-band absolute magnitude of the group, AIr2o 
(column 6); the total color of the group, {g — r)2o (column 

7) ; the line-of-sight velocity dispersion of the group, 
(column 8); the projected rms radius of the group i?_L,rms 
(column 9) ; the perpendicular distance of the group cen- 
ter from the survey edge redge (column 10). The groups 
in each portion of Table 3 are ranked in decreasing order 
of richness N. We show the first few rows of each por- 
tion of the table in the text and make the entire table 
available in the electronic version of the journal, as well 
as at http : //cosmo .nyu. edu/aberlind/Groups. 

In Table 4, we present the member galaxies of the 
groups listed in Table 3. For each galaxy we list the ID 
of the group to which it belongs (column 1); the (J2000) 
right ascension and declination (columns 2, 3); the red- 
shift (column 4); the r-band absolute magnitude Mo v 
(column 5); the '^'\g — r) color (column 6); a fiber col- 
lision flag that is equal to if the galaxy has its own 
measured redshift and 1 if it has been given the redshift 
of its nearest neighbor (column 7); the perpendicular dis- 
tance of the galaxy from the survey edge rodgo (column 

8) . As before, we show the first few rows of each portion 
of Table 4 in the text and make the entire table avail- 
able in the electronic version of the journal, as well as at 
http : / / cosmo . nyu . edu/ aberlind/ Groups. 
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Galaxies without measured redshifts due to fiber collisions 
are assigned the absolute magnitude of their nearest neighbor, as 
described in 
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RA 

Fig. 8. — 4° thick equatorial slice showing galaxy groups in the Mr20 volume-limited sample. Each point shows the location of a group 
of richness N > 3. Points have a size proportional to group richness N and a color encoding according to their total r-band luminosity 
Lr20 (defined in the text) in units of L* (where we adopt M, = —20.44), as listed in the legend. 




Same as Fig. ISl except that points show the locations of member galaxies in groups of richness N > 3. 
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Table 3. Group and Cluster Catalogs for Samples Mr20, Mrl9, and Mrl8 



ID 


RA 


DEC 


z 


N 




{a - '■)20 




-^_L,rms 


''edge 




(deg) 


(deg) 










(km/s) 


(h-iMpc) 


(ft-iMpc) 




Mr20 


33974 


239.580740 


27.312343 


0.08797 


132 


-25.920 


0.946 


723.7 


1.371 


17.7 


16089 


247.172589 


40.164633 


0.03057 


97 


-25.468 


0.891 


661.1 


1.318 


89.3 


8817 


358.535971 


-10.372017 


0.07405 


61 


-25.190 


0.921 


736.0 


0.734 


17.9 


14552 


183.450292 


59.266666 


0.09386 


51 


-24.861 


0.808 


338.3 


1.079 


22.9 


12289 


159.824898 


4.987457 


0.06815 


51 


-24.859 


0.899 


661.4 


1.161 


47.1 


3025 


195.700154 


-2.627141 


0.08183 


49 


-24.805 


0.911 


377.1 


1.247 


57.9 


20593 


169.362355 


54.469262 


0.06907 


49 


-24.831 


0.906 


426.4 


1.202 


35.5 


Mr-19 


9501 


246.963120 


40.182569 


0.03009 


197 


-25.839 


0.886 


588.7 


1.317 


88.2 


4915 


10.447791 


-9.381301 


0.05543 


95 


-25.068 


0.927 


572.4 


0.981 


38.8 


4634 


329.333792 


-7.765802 


0.05727 


86 


-25.016 


0.724 


564.0 


0.677 


52.5 


10986 


14.231949 


-0.655097 


0.04378 


86 


-24.944 


0.935 


385.4 


1.076 


5.2 


5585 


351.303515 


14.909898 


0.04113 


83 


-24.622 


0.871 


496.8 


1.045 


53.2 


3709 


214.187113 


1.962572 


0.05333 


81 


-24.902 


0.887 


368.3 


1.160 


42.9 


11585 


18.686704 


0.254973 


0.04442 


68 


-24.704 


0.903 


386.8 


0.744 


27.0 


Mrl8 


4792 


247.062059 


40.107520 


0.03011 


311 


-25.934 


0.865 


584.2 


1.300 


90.5 


2748 


351.183638 


14.580962 


0.04128 


152 


-25.057 


0.903 


446.6 


1.014 


72.3 


6984 


173.640705 


49.042739 


0.03270 


65 


-24.086 


0.918 


526.2 


0.533 


45.7 


1968 


220.146510 


3.491413 


0.02680 


54 


-23.853 


0.946 


274.1 


0.506 


23.6 


5607 


14.274495 


-0.247149 


0.04303 


52 


-24.066 


0.915 


309.0 


0.760 


13.0 


5948 


18.760997 


0.307893 


0.04326 


49 


-24.108 


0.876 


264.9 


0.659 


26.5 


5692 


51.279369 


-0.496506 


0.03664 


48 


-23.871 


0.870 


246.1 


0.802 


44.6 



Note — The rest of the table can be found in the electronic version of the ApJ, or at http://cosmo.nyu.edu/aberlind/Groups 



Table 4. Member Galaxies of Groups and Clusters for Samples Mr20, Mrl9, and Mrl8 



groupID 


RA 


DEC 


z 


Mo.v 




fibcol 


^edge 




(deg) 


(deg) 










(h-iMpc) 




Mr-20 


14 


196.769894 


-0.039161 


0.08086 


-20.168 


0.945 


1 


72.3 


14 


196.799107 


-0.024688 


0.08051 


-20.498 


0.918 





72.3 


14 


196.788454 


-0.029741 


0.08086 


-20.168 


0.945 


1 


72.3 


14 


196.779246 


-0.038656 


0.08086 


-20.168 


0.945 





72.3 


15 


197.264020 


-0.053520 


0.07962 


-20.302 


0.457 





72.4 


15 


197.207327 


0.047123 


0.07987 


-19.950 


0.895 





72.4 


15 


197.165432 


0.102322 


0.08016 


-20.467 


0.872 





72.4 


Mr 19 


1 


169.180550 


-0.213320 


0.03917 


-19.355 


0.752 





13.5 


1 


169.195964 


-0.100215 


0.03898 


-19.315 


0.584 





13.5 


1 


169.387065 


-0.187503 


0.03999 


-20.762 


0.967 





13.5 


5 


199.555960 


-0.148218 


0.04825 


-19.267 


0.321 





65.9 


5 


199.656619 


-0.226944 


0.04731 


-19.705 


0.960 





65.9 


5 


199.665084 


-0.175183 


0.04708 


-20.975 


0.976 


1 


65.9 


5 


199.679052 


-0.178932 


0.04708 


-20.975 


0.976 





65.9 


5 


199.671638 


-0.173772 


0.04708 


-20.975 


0.976 


1 


65.9 


Mr- 18 


1 


194.342587 


-0.630508 


0.02247 


-18.821 


0.744 


1 


57.7 


1 


194.353591 


-0.622488 


0.02247 


-18.821 


0.744 





57.7 


1 


194.313130 


-0.657646 


0.02295 


-18.837 


0.894 





57.7 


2 


169.180550 


-0.213320 


0.03917 


-19.355 


0.752 





13.4 


2 


169.195964 


-0.100215 


0.03898 


-19.315 


0.584 





13.4 


2 


169.387065 


-0.187503 


0.03999 


-20.762 


0.967 





13.4 


2 


169.300864 


-0.189302 


0.03972 


-18.203 


0.819 





13.4 



Note — The rest of the table can be found in the electronic version of the ApJ, or at http://cosmo.nyu.edu/aberlind/Groups 
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N 

Fig. 10. — Differential group multiplicity function for groups 
identified in the SDSS Mr20 volume-limited sample. The differ- 
ent curves are ngj:p{N) uncorrected for incompleteness (dotted blue 
curves), corrected for incompleteness due to fiber collisions {dashed 
red curves), and corrected for both fiber collisions and edge effects 
{solid black curves). The bottom panel shows the ratio of each case 
to the fully corrected one. The shaded region encloses ±10% devi- 
ations from the fully corrected multiplicity function. These results 
are averaged over all of our .Mr20 mock catalogs. 

7. MULTIPLICITY FUNCTION 

With group catalogs in hand, we can now measure the 
group multipHcity function. The differential group mul- 
tiplicity function, ngrp{N), is defined as the number den- 
sity of groups in bins of richness N, where richness bins 
can have a width of unity or more. Before computing 
i^grpiN), we must make the corrections for incomplete- 
ness described in § El Though the catalogs presented 
in § El already include the fiber collision correction, we 
also compute the multiplicity function from an alternate 
Mr20 group catalog that does not include this correction 
in order to see the magnitude of the correction. FigurelTUl 
shows this uncorrected multiplicity function, as well as 
the multiplicity function that includes the fiber collision 
correction. The figure shows that applying the correc- 
tion boosts the amplitude of the multiplicity function, 
just as it did in our mock tests in § O Figure ^1 also 
shows the effect on the multiplicity function of applying 
the edge correction described in § O This effect is small, 
typically less than 5%, though it is larger in individual 
bins at high N, where the number of groups is small. 

We must calculate errorbars for the multiplicity func- 
tion in order to use it to constrain the HOD. We use our 
mock catalogs for this purpose. Specifically, we compute 



Table 5. Group Multiplicity Function for Mr20 Sample 



A^min-A^max ^grp (A^ ) cr^^i-p ""n^i-p (PoisSOn) 



3-3 


2 


290 


X 


10- 


4 


1.110 


X 


10- 


5 


5.881 


X 


10- 


6 


4-4 


1 


054 


X 


10- 


4 


4.890 


X 


10- 


6 


3.990 


X 


10- 


6 


5-5 


4 


909 


X 


10" 


5 


4.181 


X 


10- 


6 


2.723 


X 


10- 


6 


6-6 


3 


263 


X 


10" 


5 


4.465 


X 


10- 


6 


2.220 


X 


10- 


6 


7-7 


1 


962 


X 


10" 


5 


1.979 


X 


10- 


6 


1.722 


X 


10- 


6 


8-8 


1 


496 


X 


10" 


5 


2.250 


X 


10- 


6 


1.503 


X 


10- 


6 


9-9 


1 


118 


X 


10- 


5 


2.398 


X 


10- 


6 


1.299 


X 


10- 


6 


10-10 


8 


906 


X 


10- 


6 


1.502 


X 


10- 


6 


1.160 


X 


10- 


6 


11-11 


5 


139 


X 


10- 


6 


1.292 


X 


10- 


6 


8.810 


X 


10- 


7 


12-12 


4 


223 


X 


10- 


6 


8.632 


X 


10- 


7 


7.986 


X 


10- 


7 


13-13 


3 


780 


X 


10- 


6 


7.200 


X 


10- 


7 


7.555 


X 


10- 


7 


14-14 


2 


565 


X 


10- 


6 


1.283 


X 


10- 


6 


6.224 


X 


10- 


7 


15-15 


2 


873 


X 


10- 


6 


9.335 


X 


10- 


7 


6.587 


X 


10- 


7 


16-16 


2 


868 


X 


10- 


6 


1.165 


X 


10- 


6 


6.581 


X 


10- 


7 


17-17 


1 


361 


X 


10- 


6 


6.868 


X 


10- 


7 


4.533 


X 


10- 


7 


18-18 


1 


358 


X 


10- 


6 


4.131 


X 


10- 


7 


4.530 


X 


10- 


7 


19-19 


1 


209 


X 


10- 


6 


5.133 


X 


10- 


7 


4.273 


X 


10- 


7 


20-21 


9 


817 


X 


10- 


7 


3.079 


X 


10- 


7 


3.851 


X 


10- 


7 


22-24 


6 


039 


X 


10- 


7 


2.253 


X 


10- 


7 


3.020 


X 


10- 


7 


25-28 


3 


401 


X 


10- 


7 


9.522 


X 


10- 


8 


2.266 


X 


10- 


7 


29-30 


9 


061 


X 


10- 


7 


4.483 


X 


10- 


7 


3.699 


X 


10- 


7 


31-34 


3 


398 


X 


10- 


7 


7.501 


X 


10- 


8 


2.265 


X 


10- 


7 


35-42 


1 


699 


X 


10- 


7 


6.455 


X 


10- 


8 


1.602 


X 


10- 


7 


43-61 


6 


360 


X 


10- 


8 


2.982 


X 


10- 


8 


9.801 


X 


10- 


8 



Note — rigrp and 0"ng,p are in units of h^Mpc ^. 

fractional errors from the dispersion among 10 indepen- 
dent mock catalogs for the Mr20 sample (LANLl-5 . Mr20 
mocks X 1 HOD realization x 2 mocks per simulation 
cube), and 8 mock catalogs for each of the Mr 19 and 
Mrl8 samples (LANLl-4.Mrl9/LANLl-4.Mrl8 mocks x 
1 HOD realization x 2 mocks per simulation cube). Note 
that we do not use multiple HOD realizations because 
the underlying halo populations themselves would not be 
independent. Before computing errors, we correct each 
mock catalog for fiber collisions and edge effects in the 
same way as in the data. The computed errors thus im- 
plicitly include any contribution from these correction 
procedures. 

The SDSS multiplicity function shown in Figure lTUI be- 
comes very noisy at high richness because the abundance 
of groups drops with N and the figure uses richness bins 
with a width of unity. It makes more sense to increase the 
bin width with N so as to beat down the noise. Moreover, 
since we calculate errorbars for the multiplicity function 
using our mock catalogs, each richness bin must contain 
enough mock groups so that an errorbar can be reliably 
estimated. We choose richness bins for each group cata- 
log so that each bin contains at least eight SDSS groups 
and twenty mock groups (among all mock catalogs used) . 
At low multiplicities, the bin width is always unity be- 
cause there are many groups with low TV. At higher 
multiplicities, however, the richness bins grow wider in 
order to satisfy these criteria. The bin widths for samples 
Mr20, Mrl9, and AlrlS, are listed in the first columns 
of Tables 5, 6, and 7, respectively. Once a richness bin 
is defined, the abundance of groups in that bin, ngrp{iV), 
is simply the number of groups having richnesses within 
the bin, divided by the sample volume and divided by 
the bin width. The values of ngi.p(A^) are listed in the 
second columns of Tables 5, 6, and 7. We use the same 
richness bins to compute the abundance of mock groups 
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Table 6. Group Multiplicity Function for Mrl9 Sample 



mill max 














p 




(P 


aisson) 


3-3 


4 


514 


X 


10- 


4 


2 


872 


X 


10- 


5 


1.545 


X 


10- 


5 


4-4 


1 


889 


X 


10- 


4 


1 


201 


X 


10- 


5 


9.996 


X 


10- 


6 


5-5 


1 


085 


X 


10- 


4 


9 


323 


X 


10- 


6 


7.575 


X 


10- 


6 


6-6 


6 


292 


X 


10- 


5 


8 


977 


X 


10- 


6 


5.769 


X 


10- 


6 


7-7 


5 


027 


X 


10- 


5 


5 


465 


X 


10- 


6 


5.157 


X 


10- 


6 


8-8 


2 


856 


X 


10- 


5 


2 


434 


X 


10- 


6 


3.887 


X 


10- 


6 


9-9 


1 


853 


X 


10- 


5 


2 


832 


X 


10- 


6 


3.131 


X 


10- 


6 


10-10 


1 


534 


X 


10- 


5 


2 


799 


X 


10- 


6 


2.849 


X 


10- 


6 


11-11 


1 


534 


X 


10- 


5 


2 


577 


X 


10- 


6 


2.849 


X 


10- 


6 


12-12 


1 


164 


X 


10- 


5 


2 


236 


X 


10- 


6 


2.482 


X 


10- 


6 


13-13 


8 


994 


X 


10- 


6 


2 


135 


X 


10- 


6 


2.181 


X 


10- 


6 


14-14 


7 


936 


X 


10- 


6 


2 


105 


X 


10- 


6 


2.049 


X 


10- 


6 


15-15 


5 


819 


X 


10- 


6 


1 


186 


X 


10- 


6 


1.755 


X 


10- 


6 


16-16 


5 


819 


X 


10- 


6 


1 


718 


X 


10- 


6 


1.755 


X 


10- 


6 


17-18 


5 


819 


X 


10- 


6 


1 


318 


X 


10- 


6 


1.755 


X 


10- 


6 


19-20 


2 


380 


X 


10- 


6 


5 


168 


X 


10- 


7 


1.122 


X 


10- 


6 


21-23 


2 


292 


X 


10- 


6 


5 


243 


X 


10- 


7 


1.101 


X 


10- 


6 


24-26 


1 


587 


X 


10- 


6 


4 


621 


X 


10- 


7 


9.164 


X 


10- 


7 


27-32 


7 


054 


X 


10- 


7 


2 


228 


X 


10- 


7 


6.109 


X 


10- 


7 


33-38 


7 


054 


X 


10- 


7 


3 


069 


X 


10- 


7 


6.109 


X 


10- 


7 


39-51 


3 


256 


X 


10- 


7 


4 


634 


X 


10- 


8 


4.151 


X 


10- 


7 


52-86 


1 


209 


X 


10- 


7 


3 


602 


X 


10- 


8 


2.529 


X 


10- 


7 



Note — Same units as Table 5. 

Table 7. Group Multiplicity Function for Mrl8 Sample 



A^min-A'"max ngrp(iV) Tngrp """grp (PoisSOn) 



3-3 


7 


311 


X 


10- 


4 


6.909 


X 


10- 


5 


4.000 


X 


10- 


5 


4-4 


3 


436 


X 


10- 


4 


3.325 


X 


10- 


5 


2.742 


X 


10- 


5 


5-5 


1 


948 


X 


10- 


4 


2.200 


X 


10- 


5 


2.065 


X 


10- 


5 


6-6 


1 


248 


X 


10- 


4 


1.629 


X 


10- 


5 


1.652 


X 


10- 


5 


7-7 


1 


182 


X 


10- 


4 


1.546 


X 


10- 


5 


1.608 


X 


10- 


5 


8-8 


5 


686 


X 


10- 


5 


9.917 


X 


10- 


6 


1.116 


X 


10- 


5 


9-9 


3 


284 


X 


10- 


5 


5.340 


X 


10- 


6 


8.477 


X 


10- 


6 


10-10 


3 


066 


X 


10- 


5 


5.777 


X 


10- 


6 


8.191 


X 


10- 


6 


11-11 


2 


626 


X 


10- 


5 


8.403 


X 


10- 


6 


7.581 


X 


10- 


6 


12-13 


1 


423 


X 


10- 


5 


1.629 


X 


10- 


6 


5.580 


X 


10- 


6 


14-15 


8 


756 


X 


10- 


6 


1.443 


X 


10- 


6 


4.378 


X 


10- 


6 


16-17 


1 


203 


X 


10- 


5 


1.761 


X 


10- 


6 


5.132 


X 


10- 


6 


18-23 


3 


647 


X 


10- 


6 


7.402 


X 


10- 


7 


2.825 


X 


10- 


6 


24-31 


2 


188 


X 


10- 


6 


6.091 


X 


10- 


7 


2.188 


X 


10- 


6 


32-152 


1 


447 


X 


10- 


7 


1.673 


X 


10- 


8 


5.627 


X 


10- 


7 



Note — Same units as Table 5. 

for each independent mock catalog, and we compute er- 
rors, cr„^^p , in the SDSS multipUcity function by measur- 
ing the dispersion among the mock multiplicity functions. 
These errors are listed in the third columns of Tables 5, 
6, and 7. Finally, we also compute Poisson errors for the 
SDSS ngi.p(iV), which we list in the fourth columns of 
Tables 5, 6, and 7. In some of the highest multiplicity 
bins, the Poisson errors are larger than the mock errors. 
In these cases, the mock errors are likely underestimated 
and it is best to use the Poisson errors in their place. 

Figure shows the SDSS multiplicity functions for 
the three volume-limited samples, along with the mock 
errorbars for the Mr20 sample. Though we measure and 
show the multiplicity function down to a multiplicity of 
iV = 3, our tests with mock catalogs have shown that it is 
only unbiased with respect to the true halo multiplicity 
function for N > 10. When using this measured mul- 
tiplicity function to constrain the HOD, we must either 
only use bins with N > 10, or attempt to calibrate the re- 
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N 

Fig. 11. — Differential group multiplicity functions for SDSS 
groups. The three curves show ngi-p(Af) for groups identified in our 
three volume-limited samples: Mr20, Afrl9, and Afrl8 (colors and 
line types are listed in the top-right corner of the panel). ngip(N) 
is measured in richness bins whose widths are chosen so that the 
bins contain a minimum of 8 SDSS groups and 20 mock groups. 
Points are placed at the mean richness of groups within each bin. 
Errors are shown for the Mr20 sample and are estimated from the 
dispersion among 10 independent SDSS mock catalogs. 

lation between the measured group multiplicity function 
and the true halo multiplicity function at lower values 
of N. The central curve of Figure ^1 discussed in the 
Appendix, effectively provides this calibration for Mr20 
and the cosmology adopted in our mock catalogs. 

The multiplicity functions shown in Figure 1111 appear 
to be close to power-law relations. In order to test this, 
we perform a simple power-law fit to each multiplicity 
function in the regime N > 10. We use only the diagonal 
errors of the full covariancc matrix (i.e., the errors listed 
in Tables 5, 6, and 7). We find that all three multiplicity 
functions are well-fit by power-law relations, with best- 
fit slopes of -2.72 ± 0.16, -2.48 ± 0.14, and -2.49 ± 0.28 
for the Mr20, Mrl9, and Mrl8 samples, respectively. 

8. SUMMARY AND DISCUSSION 

We have used a simple friends-of-friends algorithm 
to identify galaxy groups in volume-limited samples of 
the SDSS redshift survey. We have selected FoF link- 
ing lengths that are best at grouping together galax- 
ies that occupy the same dark matter halos. We based 
this choice on extensive tests with mock galaxy cata- 
logs, which we constructed by populating halos in N- 
body simulations with galaxies. The result of our mock 
tests is that no combination of perpendicular and line- 
of-sight linking lengths can yield groups that success- 
fully recover all aspects of the parent halo distribution, 
even for large richness systems. Specifically, FoF cannot 
identify groups that simultaneously have unbiased abun- 
dances, projected sizes, and velocity dispersions. The 
ideal group-finding parameters for a given study depend 
on its scientific objectives. Given our objective of us- 
ing the multiplicity function to constrain the HOD, it 
makes sense to sacrifice velocity dispersions and obtain 
groups with unbiased abundances and projected sizes. 
Our choice of linking lengths results in a group catalog 
that, for groups of ten or more members, has an unbi- 
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ased multiplicity function, an unbiased median relation 
between the multiplicities of groups and their parent ha- 
los, an unbiased projected size distribution as a function 
of multiplicity, and a velocity dispersion distribution that 
is ~ 20% too low for all multiplicities. We correct for 
fiber collisions and survey edge effects and present three 
SDSS group catalogs (for three different volume-limited 
samples) and their measured multiplicity functions. 

It is important to recognize that our adopted group 
finder has the above properties only for halos defined us- 
ing FoF with a linking length of 0.2 times the mean inter- 
particle separation, since this is how halos were identified 
in our mock catalogs. A different halo definition (such 
as FoF with a different linking length, or spherical over- 
density halos) would require a different set of optimal 
group-finding parameters. This is not a problem as long 
as the same halo definition is used consistently. For ex- 
ample, an HOD measured from these group catalogs will 
hold for this halo definition, and any theoretical model 
should use the same halo definition to compare its pre- 
dictions to the measured HOD. We chose this particular 
halo finder because it has been widely used and tested, 
and the properties of the resulting halo distribution (e.g., 
mass function) are well understood. 

The groups and clusters that we present here are in- 
tended to be systems of galaxies that belong to the same 
virialized dark matter halo. We can test whether these 
systems are virialized by computing crossing times for 
the groups and checking if they are sufficiently less than 
the Hubble time. We define the crossing time divided by 
the bubble time as 

Across _ (j^rms/fe'^Mpc) 

tH ^ (rrJlOOkms-i)' ^ ' 

where i?rms is the one-dimensional group radius, which is 
equal to the projected (two-dimensional) radius, i?±,rms, 
divided by the square root of two. We correct for the 
velocity dispersion bias revealed in our mock tests by 
applying a 20% upward correction to all group velocity 
dispersions, and we compute tcross/^H for all groups. We 
find that, for all three group catalogs, the median value 
of tcross/^H is ~ 0.15, and 80% of all groups have values 
less than ~ 0.29. These numbers can be interpreted in 
terms of the spherical infall model (Gunn & Gott 1972; 
iGott fc Turner! Fl 97 7a|l . or other analytic or numerical 
models. However, at a first glance, the numbers are en- 
couraging and suggest that most of our groups are likely 
virialized systems. 

The group and cluster catalogs presented here are well- 
suited for testing many of the predictions and assump- 
tions made by galaxy formation models regarding the 
relationship between galaxies and their underlying dark 
matter halos. We will investigate several of these issues 
in subsequent papers. 
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APPENDIX 

In this appendix, we describe the mock catalog tests 
that help us choose optimal FoF parameters. Since our 
primary goal for identifying groups is to measure the 
group multiplicity function and use it to constrain the 
HOD, we clearly require our FoF algorithm to produce 
groups that have an unbiased multiplicity function with 
respect to the true halo multiplicity function. In addi- 
tion, we require an unbiased relation between the mul- 
tiplicities of groups and their associated halos. Finally, 
we would like our groups to have unbiased projected size 
and velocity dispersion distributions as a function of mul- 
tiplicity. We create a grid of FoF linking lengths and 
check how each set of linking lengths performs in the 
above tests, for each of the four HOD model mock cubes 
(.Mr20, .Mr20b, .Mrl9, .Mrl8). In the case of each 
HOD model, we average results over the 10 HOD real- 
izations described in §13 and over the LANLl and LANL4 
N-body simulations. 

Before focusing on redshift space, we briefly examine 
how well FoF recovers the true multiplicity function in 
real space, since this represents the best possible case 
(any group finder will almost certainly perform worse in 
redshift space). We apply FoF to the real-space cube 
mocks using a single linking length (the linking volume 
around each mock galaxy is a sphere), and investigate 
how the recovered multiplicity function varies with the 
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N 

Fig. 12. — Effect of changing the FoF hnking length on the 
group multiplicity function in real space, measured using mock 
galaxy catalogs (described in §|2J. In the top panel, the solid black 
curve shows the input halo multiplicity function for mock catalogs 
and thus represents the "true" case. The other three curves show 
the recovered group multiplicity functions for three different linking 
lengths, which are listed at the top right of the panel in units of the 
mean inter-galaxy separation. The bottom panel shows the ratio 
of each case to the "true" one. The shaded region encloses ±10% 
deviations from the "true" multiplicity function. These results are 
averaged over all of our .Mr20 mock catalogs. 

value of this linking length. In particular, we compare 
the mock group multiplicity functions to the input halo 
multiplicity functions that were used to construct the 
mock catalogs. Figure IT^ shows this comparison for the 
.Mr20 mocks. The bottom panel of the figure shows the 
logarithm of the ratio of group to halo multiplicity func- 
tion, and the horizontal solid line therefore denotes the 
"unbiased" case. The figure reveals that, at large N, the 
group multiplicity function has an unbiased shape that 
is independent of the choice of linking length (at least 
for the range of linking lengths shown). The amplitude, 
however, is dependent on the linking length used, with 
larger linking lengths leading to a higher abundance of 
groups at large N. A linking length of 6 = 0.2 (in units 
of the mean intergalaxy separation) yields a group mul- 
tiplicity function with an unbiased amplitude at large N. 
This is not surprising given that the same value was used 
to identify dark matter halos in the N-body simulations 
while constructing mock catalogs. 

At low iV, the multiplicity function is highly biased, 
both in shape and amplitude. The abundance of groups 
relative to halos at a given multiplicity N decreases when 
FoF splits these halos into smaller groups or merges them 



to form larger groups. This decrease is countered by an 
increase due to the merging of smaller halos or the split- 
ting of larger halos. The balance between these compet- 
ing effects determines whether the multiplicity function 
is biased or not. For linking lengths near b = 0.2, merging 
dominates over splitting, which means that group abun- 
dances at a given multiplicity are mainly determined by a 
balance between halos at that N merging to yield larger 
groups and smaller halos merging to replenish the lost 
groups. However, this balance breaks at iV = 1 because, 
while FoF merges = 1 halos (i.e., isolated galaxies) to 
form larger groups, there are no smaller halos that can 
merge to replenish A^ = 1 groups. The abundance of 
A^ = 1 groups is therefore necessarily less than that of 
A^ = 1 halos (it can only be more if the linking length 
is so small - approximately & ~ 0.1 - that single galaxy 
groups splinter off in large numbers from larger halos). 
Since most galaxies live in A' = 1 halos (~ 70% in these 
mock catalogs) , merging a small fraction of them to form 
larger groups will fractionally increase the abundance of 
larger N = 2,3,4, etc. groups significantly. This is seen 
in Figure ^1 the abundance of A^ = 1 groups is lower 
than that of halos by ~ 20% for b — 0.2, causing the 
abundance of A^ = 2 and A^ = 3 groups to be ^ 50% 
higher. Only for A^ > 10 does the group abundance set- 
tle down and become unbiased. This behavior is a fun- 
damental limitation of the FoF algorithm, and it has the 
consequence that group abundances can only be trusted 
for large multiplicity groups. 

In redshift space, group finding is much more chal- 
lenging because finger-of-god distortions stretch groups 
along the line-of-sight, making it more likely that single 
halos will be split into multiple groups and that neigh- 
boring halos will be merged into the same groups. Fig- 
ure ^1 illustrates these effects by showing the perfor- 
mance of FoF in a small slice through a single mock 
catalog (one HOD realization of the LANL4.Mr20 mock 
catalog). The top- left panel shows the mock galaxies in 
real space, with each A^ > 4 halo denoted by a unique 
color. The bottom-left panel shows the same galaxies in 
redshift space, where the line-of-sight is oriented along 
the z-axis of the mock cube. Large open circles have 
radii equal to the halo virial radii and are centered at 
the halo centers in real space, and the galaxy centroids 
in redshift space. We run our adopted FoF group-finder 
(described in § 0)| on the redshift-space mock and denote 
each resulting A^ > 4 group with a unique color in the 
bottom-right panel. Finally, we show the group galaxies' 
real-space positions in the top-right panel. Large dot- 
ted circles are centered at the group centroids and have 
virial radii that are estimated by assuming a halo mass 
function and a monotonic relation between group multi- 
plicity and mass. A visual comparison of the real- and 
redshift-space panels reveals many of the failure modes 
of FoF group-finding in redshift space. The halo denoted 
by green in the left-side panels is fairly well recovered 
by FoF as the group denoted by green in the right-side 
panels. However, a couple of halo galaxies are missed in 
group finding, such as the one whose velocity moved it 
the furthest away from the center of the halo. Most of 
the galaxies in the halo denoted by blue are linked to- 
gether in the same group, also denoted by blue. However, 
many galaxies that do not belong to the "blue" halo are 
also linked to the same group. This is seen clearly in the 
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Fig. 13. — Illustrated behavior of the Friends-of-Friends (FoF) group finder. Each panel shows a 40 X 40 X 10?t~^Mpc slice through 
a mock galaxy catalog. Moving counter-clockwise starting from the top left panel, the panels show: galaxies in dark matter halos in real 
space {top left), the same galaxies in redshift space (bottom left), galaxies in groups recovered using FoF {bottom right), and these group 
galaxies in their real-space positions {top right). In each case, galaxies in halos or groups with N > A are shown as colored points, with 
each halo or group represented by a unique color. Large open circles are centered on the halo or group centers and have radii equal to the 
halo virial radii {left panels) and the estimated group virial radii {right panels). 
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Fig. 14. — Same as Fig. 1121 but for redshift space. The solid 
black curve shows the input halo multiplicity function. The dot- 
dashed black curve shows the recovered group multiplicity function 
if a single linking length is used. The other three curves show 
the recovered multiplicity functions for fixed perpendicular and 
three different line-of-sight linking lengths, which are listed in the 
top panel in units of the mean inter-galaxy separation. All other 
features are as in Fig. 1121 

top-right panel, where seven of the "blue" group galax- 
ies' real-space positions place them well outside the halo. 
A similar thing occurs to the halos and corresponding 
groups denoted by magenta and cyan. Most of the galax- 
ies in the large "red" halo are recovered correctly into the 
"red" group, but there are some galaxies added to this 
group that do not belong to the "red" halo, as well as a 
few galaxies that do belong to that halo, but have splin- 
tered off into a different group (denoted by dark green). 
Despite these imperfections, there is clearly a substantial 
correspondence between the groups identified by FoF and 
the true population of halos in this slice. 

We now examine the relative multiplicity functions 
of groups and halos when the groups are identified in 
redshift space. If we use the same linking length in 
transverse and line-of-sight directions, finger-of-god dis- 
tortions will cause halos to be split into multiple small 
groups along the line-of-sight. This is demonstrated by 
the dashed curve in Figure 1141 which shows the multi- 
plicity function of groups identified with a single linking 
length of 6 = 0.2. The abundance of groups is vastly 
underestimated for > 5, and the effect grows with N 
because richer halos have higher velocity dispersions. Wc 
therefore need to use different linking lengths in the linc- 
of-sight and perpendicular directions. We apply FoF to 



Fig. 15. — Effect of changing the FoF linking lengths on the 
relation between the distributions of input halo richness and recov- 
ered group richness in redshift space, measured using mock galaxy 
catalogs. Each input halo is matched one-to-one to a recovered 
group whenever possible; however, some halos have no correspond- 
ing group and some groups have no one-to-one parent halo. The 
top panel shows the halo completeness as a function of halo rich- 
ness, i.e., the fraction of halos at each richness that can be matched 
one-to-one with a recovered group. The middle panel shows the 
spurious fraction of groups as a function of group richness, i.e., 
the fraction of groups at each richness that cannot be matched 
one-to-one with a parent halo. The bottom panel shows the re- 
lation between halo and group richness for halos and groups that 
are matched one-to-one. Middle curves show the median relation 
and outer curves show the 10 and 90 percentiles (they enclose 80% 
of the group-halo pairs). The area between these outer curves is 
shaded. In all panels, different line types and colors show fixed 
perpendicular and different line-of-sight linking lengths, which are 
listed in the top panel in units of the mean inter-galaxy separation. 
To avoid confusion, the 10 and 90 percentile curves (as well as the 
shading between them) in the bottom panel are only shown for one 
of the linking length combinations. All results are averaged over 
twenty mock galaxy catalogs. 

our redshift-space cube mocks for a grid of perpendicular 
and line-of-sight linking lengths and find that we can re- 
cover an unbiased multiplicity function at large N for the 
right combinations of linking lengths. Figure E| shows 
one such combination {b± = 0.14, bz = 0.75) and demon- 
strates how the group multiplicity function changes with 
the line-of-sight linking length bz . Generally, larger link- 
ing lengths in either direction lead to a higher abundance 
of groups at large TV. We record all linking length combi- 
nations that yield unbiased multiplicity functions in the 
large N regime and show the successful parameter space 
in Figure 131 as discussed in §01 
Recovering an unbiased multiplicity function does not 
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guarantee that the one-to-one relation between the mul- 
tiphcities of halos and their recovered groups is also un- 
biased. We therefore also investigate this relation. As 
described in § 2J we associate each halo to the recovered 
group that contains the halo's central galaxy. Groups 
that contain central galaxies from more than one halo 
are associated with the halo with which they share the 
largest number of galaxies. Halos that end up not being 
associated with any group are considered "undetected," 
and groups that are not associated with any halo (i.e., 
they contain no halo central galaxies) are considered 
"spurious" . Once we have associated mock groups one- 
to-one with their parent halos, we can look at the relation 
between the halo and group multiplicities (i.e., A^true vs. 
iVobs)- In addition, we can look at the fraction of ha- 
los that are detected and the fraction of groups that are 
spurious. Figure 1151 shows how these relations depend 
on the line-of-sight linking length. The bottom panel of 
the figure shows one set of linking lengths {b± = 0.14, 
bz = 0.70) that yields an unbiased median relation be- 
tween A'truc and iVobs, but the scatter around this rela- 
tion is large and quite asymmetric. 90% of groups at a 
given A^obs are associated with halos that have up to 40% 
higher and 60% lower A^truc- Increasing the line-of-sight 
linking length causes groups to grow and thus biases the 
median A'tiuc vs. Aobs relation by tilting it toward larger 
A'obs- As before, we record all linking length combina- 
tions that yield unbiased median relations between group 
and halo multiplicities, and we show the successful pa- 
rameter space in Figure |21 

The top panel of Figure El shows the completeness 
(fraction of halos that are associated one-to-one with 
groups) as a function of halo multiplicity A^truej and the 
middle panel shows the spurious group fraction as a func- 
tion of group multiplicity A^obs- Over a wide range of FoF 
linking lengths, the completeness for halos with N > 5 
is over 95%, and the spurious fraction for groups with 
A^ > 5 is less than 5%. Increasing the line-of-sight link- 
ing length causes a drop in the halo completeness and a 
corresponding drop in the spurious group fraction, since 
more halos get linked to the same groups. For the final 
linking lengths that we use (see § 0)) , the halo complete- 
ness is greater than 97% and the spurious group fraction 
less than 1% for A^ > 10. The high completeness and 
low spurious fraction are a result of how we associate 
groups to halos. Since we only require a group to have 
a halo's central galaxy in order to be associated with 
it, most groups and halos have one-to-one associations. 
If we used a more stringent criterion for group-halo as- 
sociation, for example by requiring that a group contain 
some minimum fraction of a halo's galaxies, then the halo 
completeness would be lower and the spurious group frac- 
tion higher, but the scatter in Atruc vs. A"obs would be 
reduced. The three panels of Figure El put together, 
characterize the errors in the FoF group finder. Chang- 
ing the definition for how groups are associated to halos 
does not change the errors in group-finding; it merely 
redistributes the errors among the three panels. 

In addition to requiring that our groups have unbiased 
abundances and multiplicities, we would also like them to 
have unbiased size distributions. For every group in our 
redshift-space cube mocks, we measure the projected rms 
radius and the line-of-sight velocity dispersion of galax- 
ies. We compare these to the projected rms radii and 
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Fig. 16. — Effect of changing the FoF linking lengths on the 
size distribution of groups in redshift space, measured using mock 
galaxy catalogs. The top panel shows the projected 2-dimensional 
rms group radius distribution as a function of group richness N. 
The bottom panel shows the same for the 1-dimensional line-of- 
sight velocity dispersion (T„. In both panels, the black curves and 
shading show the size distributions of galaxy systems that occupy 
the same dark matter halo and thus represent the "true" cases. The 
sets of colored curves and shadings show the size distributions of 
recovered groups for fixed perpendicular and three different line- 
of-sight linking lengths, which are listed in the bottom panel in 
units of the mean inter-galaxy separation. Middle curves show the 
median relation and outer curves show the 10 and 90 percentiles. 
The area between these outer curves is shaded. All results are 
averaged over twenty mock galaxy catalogs. 

actual velocity dispersions of halo galaxies. Figure [TBI 
shows the median, 10th, and 90th percentile projected 
size and velocity dispersion as a function of multiplicity 
for halos, compared to that for groups identified with 
two different line-of-sight linking lengths. Increasing the 
line-of-sight linking length produces groups with higher 
velocity dispersions, but it has less impact on the pro- 
jected size distributions. The opposite is naturally true 
when we increase the perpendicular linking length. Link- 
ing length combinations that yield groups with unbiased 
abundances and projected sizes tend to yield velocity dis- 
persions that are biased low. This is illustrated in Fig- 
ure El which shows that the linking length combination 
b± ~ 0.14, bz — 0.7 yields groups with velocity disper- 
sions that are ~ 20% too low relative to halos. The 
line-of-sight linking length must be more than doubled 
to repair this bias, but then the abundances of groups 
would be too high. 

FigureOshows the linking length parameter space that 
satisfies each of the above tests. As discussed in §01 there 
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Fig. 17. — Same as Fig. El but using the .Mr20b set of mock 
catalogs, which are constructed with a different input relation be- 
tween halo richness and dark matter halo mass, as described in 

§11 
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Fig. 18. — Same as Fig. El but using the .Mrl9 set of mock 
catalogs, described in §|2| 

is no combination of perpendicular and line-of-sight link- 
ing lengths that yields groups with unbiased abundances, 
projected sizes, and velocity dispersions, even at high 



multiplicity We choose to sacrifice velocity dispersions 
and adopt the parameters b± = 0.14, = 0.75. All the 
above tests and resulting choice of linking lengths were 
done using the .Mr20 mock catalogs. Since we plan to 
use our group catalog to constrain the HOD, it is vital 
that our choice of linking lengths docs not depend sensi- 
tively on the input HOD assumed when constructing the 
mocks. For this reason, we repeat all the above tests with 
the .Mr20b mock catalogs, which use a different input 
HOD to model the same Mr 20 sample of SDSS galaxies. 
The results are shown in Figure El It is clear that our 
adopted group finder performs equally well in both sets of 
mock catalogs, demonstrating that our choice of linking 
lengths is insensitive to the underlying HOD. It is also 
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Fig. 19. — Same as Fig. El but using the .Mr 18 set of mock 
catalogs, described in §E1 

important to show how well our linking lengths work on 
lower luminosity galaxy samples, since we apply them to 
the SDSS A/rl9 and Mrl8 samples. We thus repeat our 
mock tests with the .Mr 19 and .Mr 18 mock catalogs and 
show the results in Figures ^1 and ^1 respectively. The 
figures show that lower luminosity (higher density) sam- 
ples require slightly higher line-of-sight linking lengths in 
order to retain unbiased multiplicity functions. However, 
this effect is small. When applied to the .Mrl8 mock 
catalogs, our adopted linking lengths yield a multiplic- 
ity function that is 10% too low in amplitude. Overall, 
Figures 13 El El and 1 1 91 demonstrate that our choice of 
linking lengths is fairly robust. 
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